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ABSTRACT 

A concept for an apparatus which visually displays and responds 
to the first and second formant of vowel sounds is developed. The 
machine is intended for use by deaf and speech handicapped children 
in learning to produce voiced sounds. System design and principles 
applied to realize a physical prototype of this concept are presented. 
The complete electronic and mechanical design plus fabrication of the 
automatic electronic speech training responder is described in 
detail. Schematic diagrams of all electronic circuitry employed 
and photographs of the prototype equipment are included. The 
apparatus is on loan to the Monterey Institute for Speech and 
Hearing, Monterey, California, for clinical testing and evaluation. 
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1. INTRODUCTION 



Man is born with the natural instinct and physical capacity 
to eat and breathe, but he must learn how to speak. This learning 
process depends on good hearing ability during the formative years. 

A dhild, during initial attempts to speak, constantly monitors his 
utterances with his ears. These sensors provide the necessary 
information to the brain to modify the vocal tract modulators and 
articulators with respect to the points of articulation until the 
desired sound, phoneme or word is correctly produced. If this 
feedback loop (voice output-ear sensor-brain input) is defective 
or nonexistjMat in a human, it is necessary that another physical 
sensor must be used as an alternate feedback path to monitor 
generated speech sounds on a real time basis if intelligjfint and 
comprehensible communication is to be achieved. Mary devices have 
been devised and constructed which transform speech sounds into 
a visual display or a tactile signal. 

This thesis is directed toward the attempt to process 
specific speech sounds and to display or provide a positive response 
when the desired sound has been correctly produced. In addition, 
the machine must be simple in final output so that it can be easily 
used and interpreted by children. 

Computer sciences have stimulated research into speech recog- 
nition and s 3 mthesization. Unfortunately, this type of engineering 
technology is too costly and sophisticated at the present time for 
application to elementary speech training piroblems. Rather specific 
guide lines on needs of training devices for children were 
developed by Dr. Burl Gray of the Monterey Institute for Speech 
and Hearing; these are: 
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1. A definite need exists for simple, inexpensive devices 
which will assist or supplement the speech therapist's 
work with deaf children. These devices would permit the 
instructor to teach more students simultaneously or the 
devices could perform elementary tasks of providing 
various mechanical responses to repetitious articulation 
drills without the constant attention or intervention of the 
speech therapist. 

2. The information display or mechanical response of such an 
apparatus must be in a form which is easily communicable 
to and understood by the child. Careful attention must 
be given to the human-machine interface problem to insure 
good results with a given age group and mental attitude. 

3. The apparatus must present the visual or mechanical 
response while the child is speaking (i.e, real time). 

Using these criteria, an attempt has been made to design and 
construct an apparatus which will respond only to a defined pro- 
nounciation of the basic American vowel sounds. The vowel sounds 
were selected for machine recognition because they require the 
minimum amount of audio spectral information to be uniquely 
identified. However, the approach to this vowel processing 
technique is sufficiently general, It may have possible extensions 
to process other sounds. 

Figure 1 is a graphic representation of a generalized man- 
machine speech feedback system. 
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Figure 1 - Man- 'Machine Speech Feedback System 



2. THE VOWEL SOUND 



A human can produce a multitude of speech sounds by controlling 
his articulators (the tongue and lower lip), the points of articula- 
tion (the upper lip, alveolar ridge, the hard palate, the soft 
palate or velum and lower teeth), and the excitation of his vocal 
bands. The vocal bands, if tensed and therefore vibrating, modulate 
the air stream exhaled from the lungs to establish a category of 
sounds which are classed as voiced sounds. All vowels are voiced 
SQunddswhich are excluded from entry into the nasal cavity by a 
raised veliun and therefore eminate solely from the oral cavity. 

It will become apparent that the vowels are constrained to a 
small category of speech sounds by definition of the manner in 
which they are articulated. In fact, the basic American vowels 
consist of 10 phonemes. Tab¥e 1 lists the individual sounds with 
their phonetic notation and representative words. £l0, 30,331 

Since this thesis is devoted to application of electronic 
techniques to speech processing, it is natural to begin with a 
machine which will react to the most fundamental sounds which 
require the minimal spectral information to bo recognized or 
Identified. The vowel can be specified by a minimum of two spectral 
parameters in most so\md situations. Joint discussions with Dr. 

Gray and Dr. Ewing resulted in establishing a mutually acceptable 
concept of an electronic vowel teaching machine. This local merger 
of ideas frm two disciplines proves once again that scientific 
boundairios can greatly overlap and the systems engineering approach 
to problems may bo of groat benefit to all concerned. 

The theory of vowel production can bo described in terras of 
steady state (or harmonic) conditions with application of 
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TABLE 1 



VOWEL PHONETIC SYMBOLS AND REPRESENTATIVE WORDS 



Typewritten 
Synbol for 
Vowel 


IPA 

Symbol 


Representative Words 


lY 


• 

1 


heed 


beat 


eat 


I 


I 


hid 


bit 


it 


E 


e 


head 


bet 


let 


AE 


at 


had 


bat 


hat 


A 


a 


hod 


calm 


father 


Qf’i 


0 


hawed 


fall 


lost 


U 


XX 


hood 


full 


foot 


00 


AjL 


who'd 


fool 


pool 


UH 


A 


hud 


above 


tub 


ER 


3 " 


heard 


word 


hurt 




£ I e af B. o V" M ^ 3T 



Figure 2. Typi««-1 Speetrejrjuns of tho rowols by a 

male voice. 
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cord -tone-resonance effects, [zi] Modern analytical representation 
of the same effect can be stated in terms of excitation functions 
and convolution techniques, [3b] The former description states in 
effect that the vocal bands ( a modem synonjnm for cords [3] )» 
during^ phonation, set up in the air iirtusdiately adjacent to them 
a complex motion which consists of a fundamental component, kno>m 
as pitch, and a large number of its overtones or harmonics. This 
complex air motion constitutes the so-called band -tone. The 
theory further states that the vocal cavities, on which the band- 
tone acts as a force, have the properties of simple resonators and 
thus serve to modify the spectrum of energy flowing from the bands. 
In terms of this theoi^’’, a vowel sound, as emitted from the mouth, 
is due to both selective generation and selective transmission plus 
radiation. This sound is composed mainly of harmonic components 
of the fundamental each of which has a determinable magnitude. 

For example, the greatest magnitudes cf the hamonic components 
usually are found to exist for the 6th through 9th component and 
13th through 16th component for the particular vowel sound /a/, [zi] 
Naturally, for other vowels, the oral cavities change in physical 
dimensions thus affecting the resonant properties of these chambers 
and hence causing other harmonic components or partials of the 
fundamental vibration of the vocal bands to be amplified or atten- 
uated . 

The spectograph has greatly enhanced the study of speech 
sounds and in particular vividly identifies the amplified partial 
tones or resonant frequencies uniquely identifiable ;’rith each vovrel 
soimd, [32,33j Figure Z provides a sketch representing the 
spectrographic tracings due to each vowel sound. The dark areas 
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represent the amplified harmonics of the fundamental pitch of the 
voice. Note that these locations are unique for each vowel, 
especially for the first and second resonant frequencies. In the 
terminology of visible speech the dark bands are called "formant" 
regions or 'TDars" and for reference purposes are designated by 
number, the lowest on the frequency scale being bar 1 or first 
formant FI, the next bar 2 or second formant F2, etc. In this 
thesis, the notation FI and F2 shall be used to designate the 
first and second resonant frequencies of vowel sounds respectively. 

The first and second formants are the only two pieces of 
spectral information required, in most cases, to identify a particular 
vowel. The third formant (F3) is helpful in distinguishing between 
overlapping first and second formant frequencies. Potter and 
Peterson have suggested that the human ear recognizes vowel sounds, 
not by the spectral location of FI and F2, but rather by the relative 
frequency separation or difference between FI and F2, \y^ Table 2 
lists the FI, F2 and F3 frequencies for the vowels of Table 1 while 
Table 3 lists the relative formant amplitudes, [^30^ Figure 3 shows 
a two dimensional plot of FI vs F2, [ 9 ] This figure is the crux 
of the apparatus designed to recognize vowel sounds. Note that 
in the F1-F2 plane each vowel has a specific location; also it 
is interesting to note that the locations of these sounds corresponds 
roughly to the position of the tongue in the oral cavity if you 
imagine looking at a side view of the head. 

The vowel training device does not work on the relative location 
of FI to F2 but rather utilizes an electronic spectral window in the 
F1-F2 plane to target a particular vowel sound or for that matter, 
any voiced combination of two oral resonances in this dual formant 
plane . 
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TABLE 2 



AVERAGES OF FUNDAMENTAL AND FORMANT FREQUENCIES 



IPA 




Fundamental 


First 


Second 


Third 


Symbol 




Frequency (Hz) 


Formant (Hz) 


Formant (Hz) 


Formant( Hz ) 


• 


M 


136 


270 


2290 


3010 


c 


W 


235 


310 


2770 


3310 




Ch 


272 


370 


3200 


3730 




M 


135 


390 


1990 


2550 


T 


W 


232 


430 


2480 


3070 


> 


Ch 


269 


530 


2730 


3600 




M 


130 


530 


1840 


2480 


t 


w 


223 


610 


2330 


2990 




Ch 


260 


690 


2610 


3570 




M 


127 


660 


1720 


2410 


af 


W 


210 


860 


2050 


2850 




Ch 


251 


1010 


2320 


3320 




M 


124 


730 


1090 


2440 


g 


W 


212 


850 


1220 


2810 


V4 


Ch 


256 


1030 


1370 


3170 




M 


129 


570 


840 


2410 


0 


VI 


216 


590 


920 


2710 




Ch 


263 


680 


1060 


3180 


V 


M 


137 


440 


1020 


2240 


VI 


232 


470 


1160 


2680 




Ch 


276 


560 


1410 


3310 




M 


l4l 


300 


870 


2240 


u 


W 


231 


370 


950 


2670 




Ch 


274 


430 


1170 


3260 




M 


130 


640 


1190 


2390 


A 


W 


221 


760 


1400 


2780 




Ch 


261 


850 


1590 


3360 




M 


133 


490 


1350 


1690 


W 


218 


500 


1640 


i960 




Ch 


261 


560 


1820 


2160 
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TABLE 3 



FORMANT AMPLITUDES MEASURED RELATIVE TO /d/ 



IPA 

Symbol 


First 

Formant (db) 


Second 

Formant (db) 


Third 

Formant (db) 


L 


-4 


-24 


-28 


I 


-3 


-23 


-27 


£ 


-2 


-17 


-24 


S 


-1 


-12 


-22 


d 


-1 


-5 


-28 


0 


0 


-7 


-34 


U 


-1 


-12 


-34 


JL< 


-3 


-19 


-^3 


A 


-1 


-10 


-27 


T 


-5 


-15 


-20 



* 
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F 2 (Hz) 




200 250 500 IK 

FI (Hz) 



Figure 3. Central Regions of First and Second Formant 
Frequencies of the Common American Vowels. 
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3. AI5STR AS A TEACHING AID 

During the process of physically realizing a prototype of the 
vowel training machine, liberty was taken by the author and his 
associates in the electronics laboratory to coin a name for this 
device. The result is Automatic Electronic Speech Teaching 
Responder, The title should convey the notion that this machine 
is not intended to replace a speech therapist but rather assist 
him in his work, AESTR will be initially preset through the 
oscillator frequency dials and the control knobs located on the 
front panel of the device. Now the child is placed in a room with 
a candy dispenser or some other motivational responder and a 
microphone. He is asked to make any sound he cares to. As the 
child produces various sounds he should produce the desired sound 
in due time. The machine will only activate the candy dispenser when 
the child has produced the targeted voiced sound and the child 
will keep tr 3 ring to repeat the sound in order to maximize his 
reward . As the rewards are given more frequently, the teacher is 
able to adjust the filter band\»idths on AESTR and narrow the 
spectral window of the desired sound, hence increasing the 
articulation accuracy required of the child if he is to obtain 
his reward. 

The child learns to speak desired sounds by communicating 
directly with AESTR, However, positive control of the speech 
training process is available to the teacher by his ability 
to vary six parameters from the front panel of AESTR, (FI 
bandwidth and sensitivity, F2 bandwidth and sensitivity, pitch 
filter, microphone gain). 
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4. SYSTRJ^ CRITERIA ANi:i DESIGN 

The system incorporates tvo basic electronic functions to 
locate and measure, in real time, the first and second formants of 
a voiced sound. The speech sound is first mixed \7ith two local 
oscillators by means of non-linear devices. One of the components 
obtained from this process, the difference frequencies between 
the formants and oscillators, is isolated by active low pass 
filter circuits. The oscillators and low pass filters are 
variable and can be set for a particular sound or spectral 
window, setting the two local oscillators to the known fre- 
quencies for FI and F2 of a particular vowel and the low pass 
cutoff freq'iency for the desired degree of acaaracy of response, 
the machine is able to process the speech sound and provide a 
binary decision response. 

The responses are; 

1, A positive response which is movement of two voltmeter 
irK^ieators and a light being activated if both meters 

are at maximum value simultaneously. This condition occurs 
vrhen the resonant frequencies of the voice correlate 
T'dth the preset local oscillator frequencies simul- 
taneously. The correct voiced sound is being produced 
by the student. The apparatus also has an external 
motivation output Jack which can operate other reward 
machines when the targeted sound is produced by the 
student, 

2. No response. One or both formant frequencies are not 
present or they do not correlate within limits set by 
the filter pass band. 
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Mai^ methods were considered for realization of this device in 
terms of simplicity, cost and expediency. Primary concern was to pro- 
duce SOTie type of primitive machine which would do the basic tasks re- 
quired by this particular vowel teaching aid. The approach finally 
selected for the first attempt is to process the canplex speech wave- 
form in analog form in the audio spectum. Advantage was taken of 
the Field-Effect Transistor(FET) which has an almost perfect square 
law response and which is ideally suited for optimum mixing of oscil- 
lator and voice frequencies. The filtering -is accomplished ty means 
of active low-pass filters using the readily available integrated cir- 
cuit operational amplifiers. 

An additional factor must be considered in AESTR's system design. 
The pitch of a human voice can range from approximately 75 to 500 Hz, 
|^32j The formant frequencies range from approximately 250 to 3000 Hz, 
It is necessary to eliminate the pitch frequency from the audio speech 
prior to the mixing operation, otherwise it is possible for the pitch 
or fundamental frequency to pass directly through the mixers and 
filters thus producing a positive machine response regardless of the 
formant and oscillator frequencies present. Figure 4 represents the 
basic system approach for realization of this apparatus. 
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Figure U, AESTR Electronic Transducer, Detector, and Display System 
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CONTROL t^ANKL DESIGN AND OPERATION 



AIC3TR is tc be operatsd by individuals who do not possess 
an engineering background. Therefore the panel is designed 
to be self explanitory and requires minimum instruction for 
operation. The controls are fairly large to permit positive 
grasp by the operator. Also functional location of the knobs 
and visual indicators is evident by the partition lines. The 
objective is to have the panel functions reflect the needs of 
the operator rather than the requirements of the internal cir- 
cuitry. AESTR's control panel is shown in Figure 5* 

The Volume’' control is self explanitory and permits the 
operator to vary the gain of the preamplifier circuit. 

The "pitch" control permits selection of four cutoff fre- 
quencies of the high-pass filter circuit in order to suppress 
the fundamental frequency of a voice while passing all the 
formant frequencies. In Table 4 below, the letter positions 
are identified with the 3 db cutoff frequencies of the high- 
pass filter. 



TABLE 4 

HIGH-PASS FILTER CUTOFF FREQUENCIES 



Pitch control position 


Frequency (Hz) 


A (male voice ) 


75 


B (female voice) 


190 


C (child's voice) 


450 


D (special use) 


1050 



The pitch control setting is not critical for the back vowel 
sounds such as OW in the word "father" and can remain in the 
"A" or "B" setting for all speakers regardless of sex or age. 
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Figure 5. AESTR Control Panel 




Position “D*' is used when working with the central vowel such 
as ER in the word ’'bird**. It is necessary to suppress the fre- 
quencies below 1 KHz and operate with the second and/or third 
formants in order to have the machine only respond to this 
particular vowel. This technique was developed during the 
testing of AESTR and is discussed further in section 13* 

The ”FT LPF cutoff” and ”F2 LPF cutoff are variable cutoff 
low-pass filters. The controls are located above the first 
and second formant voltmeter indicators respectively. Cutoff 
frequencies of 10, 15, 30 and 60 Hz are printed around the 
periphery of the control knobs. Normally the controls are 
initially set in the 60 Hz position when searching for voice 
formant. This setting provides the widest possible filter 
pass band, such that the voltmeter needle will begin to deflect 
up scale whenever the oscillator and voice formant are within 
+ 6o Hz of each other. As the two frequencies become more near- 
ly coincident, the voltmeter needle will show a maximum scale 
deflection. When the operator has the oscillator set at a 
frequency which gives the maximum needle deflection, he may 
elect to switch the "LPF cutoff” control to 30 Hz in order to 
narrow the filter response pass band. It may be necessary 
to readjust the local oscillator slightly for maximum scale 
deflection. This procedure can be continued for the 15 and 
10 Hz cutoff frequencies respectively. 

The ”F1 Filter Sensitivity” and ”F2 Filter Sensitivity” 
controls vary the gain of the filters. The word "sensitivity” 
is chosen for contrast against the "volume” control nanenclature 
and is intended to prevent any misunderstanding between the two 
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types of controls. The '“flit-sr’ c;aa£.itlvL\y” controls are ad« 
justed to make tho r-eodle C4f?.ei't to full scale trhcii 

the local oscillator and foi-iTjant fi-eqa ancles riosi nearly co- 
incide, Kach vcvrel soirsd vrilD. have its unique ''filter sensi- 
tivity" setting due to the varyliiti-;; intensity levels of the 
fonants of the individual phonemes , The oper^!.tor Must deterr:iine 
these settings eoipiricalXy since the sensitivity is also a 
function of the intensit,y of the speaker Voice, It is ad- 
visable to keep the "voIums" control knob at a MiniwiUi^i setting 
and the "sensiti'/ity" control knobs ai ?i hiigh setting 'to reduce 
the effects of acoustic and el.3ctrical noise. 

The "correct" green light ill-j^islsiates when both formant 
indicators read an up scale dof Isctiors of ? volts. Light 
activation is delayer’ 2p0 milliseccrjds and ovee lighted, stays 
on for a period of 2 seconds. The delay prevents the light 
froin being activated by transient frsil scale r.efiectlcns vrhich 
occur from plosive type consonaint sounds prec«;ing a vot-rel 
in such a word as "bar". The light hold tiro© of 2 seconds 
prevents the light frewa flickering if the voice begins tc 
quiver during articulation of a phoneme. 

In the real' of the APJSTR cabinet is located an ordinary 
female 115 volt rC'Ceptacle, Anj," external raotivalional device, 
such AS an M&M candy dispensers can be attached to this ter- 
minal and viill be operated sutcaatl cally since the tenainal 
provides II 5 volts only during the intewal vihen the "correct" 
light is illuminated, 

AlCSTR also has the capability of measurirjg the pitch of 
a person’s voice, Tvtrn the "pitch" control cloci-cwise beyond 
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to 10 Hz and ths "Fi Sensitivity" control to a maxiiaura value of 
10, Turn the "F2 Sensitivity" full counterclockwise to a value 
of 0, Sweep ths FI local oscillator through a range of 60 to 
500 Hz, The speaker pitch -ijill bo read on the FI oscillator 
frequency setting when the first formant visual Indicator has 
a maximum up-scale deflection. 



6. PREAMPLIFIER DESIGN 

The mixer circuit is able to accept a maximum input signal 
of 0.8 volts peak to peak. A preamplifier is necessary, . es= 
pecially if a dynamic microphone is being used, to amplify the 
voice sound for ^iaxiraum mixer output. The Fairchild uA?09 
operational amplifier was selected to perform this function 
untllizing the standard feedback configuration and necessary 
frquency compensation. It is shown schematically in Figure 6. 

The uA?09 ernes in an epoxy T0<=5 configuration . The.;de- 
tailed circuitry employed in this integrated circuit and per- 
formance data are readily available from the manufacturer. jll] 
The price of this device iR not considered to be excessive 
at the present time. 
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^ote: Letter-number combination inside square indicates 

circuit board by the letter and the terminal of the board 
by the number. 



Figure 6 

Preamplifier Schematic 
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7 . MIXER CIRCUIT 



The transfer characteristic of field -effect transistor 
(rET), made by the diffusion process, has a square-law rela- 
tionship betvreen the drain to source current, I^g and the gate 
to source voltage, Vgg. It is expressed as \_^ 5 \ 



^ds ~ -^DSS ( i - %s/ Vj )'■ (l; 

T-shere Ip^js is the saturation drain current when the gate is shorted 
to the source ( Vq^ = 0 ) and Vp is the pinch off voltage. i?’or 
mixer operation let Vgg be represented as the siXii of tvjo sinu- 
soidal voltages both of which can be sinultaneousiy iiapressed 
on the gate of an FEf or one impressed on the gate and the 
other on the source of the FKT. '.either 'oroccss vriii cause 



mixing operation and Vgs as defined below holes true for both 
cases 



%s ~ ^GS '^s cos Wgt + Vy cos '.ryt i,? j 

where Vqs is the bias gate to source voltage. Vs cos Wgt 
represents the source or voice sound while cos is tht 

sinusoid generated bv'’ the local oscillator vihich is applied 



to the gate or source of the FE'f. Substituting- ( 2 ) into (i) 



and expanding, ue obtain 

Ids = Ib 33 [ 
Vp2 L 



V^2 + Vqs 2 + i-Vs^ + tv,/- (3) 

-2(V^ - Vq^Ji (Vgcos wst + V^cos VqZ} 

+ -g- Vs2 cos 2wsh + 2 Vy2 cos 2wot 




cos(i-(s 




The drain current has DC components plus six individual fre- 
quencies as a resul c of the square law nixing of an FET, This 



response sho;?s that onuy x-requencles of the form w^, Wq, 2ws, 
2w„, W3 + xTq, and rjj. - are obtained vditle other fi'cquencics 
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of the foMd rnWg + nw^, which must be suppressed in conventional 



The frequency component of the drain current which is of 
interest is YgV^cosCws - Wo)t. It is separated from the other 
components coupling to the mixer output a DC blocking capa- 
citor followed by a low-pass filter which has a cut off fre- 
quency Wf ^i such that Wg - Wq < both Wg and Wq, 

Initially, a dual gate metal oxide semiconductor (MOS) 

FET was selected as being particularly well suited for use in 
mixing two audio frequencies. Eight 3Nl4l M0S-FET*s were ordered 
but due to excessive delay in receipt of these devices, it was 
necessary to design and build a mixer using a single gate 
FET already in stock in the school electronics issue roan. 

This device requires that the voice signal be impressed on the 
gate while the local oscillator signal is applied to the source 
terminal. Several types of FET’s available from the issue room 
were tested for mixing action in the circuit shown in figure ?a. 
The 2 N 38 I 9 proved to be the most satisfactory device. Its 
trans conductance as a function of gate to source voltage is 
quite linear over the range from zero Vq 3 to pinch off voltage 
Vp. This characteristic enhances the mixing action of an FET,[2o] 

The local oscillator used in AESTR is a URM-127 signal 
generator. It has an output impedance of approximately 100 
ohms and can deliver a signal ranging from the microvolt range 
to a maximum of 10 volts. 

In designing the mixer circuit the author relied on the 
manufacturer's data sheet for the 2N3819 FET, It is an N- 
channel device with Vp = -8 volts, IpsS “ 10 average 



mixers, are greatly reduced with an FET mixer circuit. 
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Figure 7b 

AESTR Mixer Schematic 



36 



transconductance of 4000 micromhos for zero gate bias. The DC 
drain current Ip was selected to bo 1 ma. and the mixer circuit 
was designed to give a voltage gain of 10, Those conditions 
were incorporated into the circuit design ^25^ and values were 
obtained such that Rs = 5»5 kohms and Rp = 8,1 kohms. The 
network in Figure ?a was constiructod and the components sub- 
sequently modified to the circuit of Figure 7b to obtain 
optimum mixing. 

Successful mixing of ary two audio frequencies is accom- 
plished by moans of this circuit with no lower limit on the 
input and local oscillator voltages. An upper limit of 1,5 
volts peak to peak for the signal and local oscillator voltages 
cannot be exceeded; otherwise the output is clipped. Optimum 
operation of this mixer circuit is sot for an input of 
approximately 0.8 volts peak to peak. Above this voltage, 
the follow-on filter circuits begin to give spurious outputs 
duo to sweeping of either the voice oscillator or local os- 
cillator across the frequency spectrum. This effect is notice- 
able on the FI and F2 voltmeter indicators and masks the fre- 
quency response of the filters. 

The 2N3819 FET’s have consistently performed the mixing 
operation on a daily basis during the entire period covering 
the design and testing of the formant indicators. These par- 
ticular FET's are highly recommended both for their reliability 
and usefulness in audio mixing circuits. 

As an epilog to the mixer design realization, the 3N141 
M0S-FET‘s did arrive finally. Other students have had limited 
success in using these devices for mixing. Special care must 
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be exercised in using then, especially with regard to preventing 
any external high voltages (static charges, etc.) from acci- 
dentally damaging the devices. 
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8. FILTER CONSIDERATIONS 

Many types of network designs will yield either low-pass, 
high-pass or band-pass frequency filters. The networks may be 
synthesized using only passive elements [^18, I9] or in addition 
to resistive or capacitive ctMuponents, incorporate a radio tube, 
[35] transistor, [^15^ or an integrated circuit operational ampli- 
fier, When a filter design calls for cutoff frequencies 

below 100 Hz, several considerations tend to indicate that an 
active RC filter circuit is the most desirable type. Table 5 
lists the relative characteristics of passive and active filters 
with cutoff frequencies below 100 Hz, 

Active network synthesis can be classified in a number of 
ways, depending on the purpose of active elements and the network 
configuration. The three main types of active synthesis consist 
of a. Classical Amplifier Design where the active element is 
part of the parameters of the network, b. Feedback Systems where 
feedback theories are used to S3mthesize poles and zeros of a 
network function. In this case active elojients are used as 
isolation or amplification devices, or as functions of oper- 
ational amplifiers, c. Modification of Passive Synthesis 
where techniques of passive synthesis are used to realize 
portions of a network that are connected together by active 
elements. In all three categories listed, the active elements 
are used mainly as controlled-source devices which perform 
functions of subtraction , negative-constant multiplier or 
inversion. They can be treated as black boxes performing their 
prescribed mathematical functions, [38 j 

The ideal low-pass filter with unity transmission below 
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TABLE 5 



COMPARISON OF PASSIVE AND ACTIVE FILTER CHARACTERISTICS 
FOR LOW FREQUENCY APPLICATIONS 



Passive Rl.C or RC Filters 


Active RC Filters 


1. Inductors tend to be expen- 


1. An inductor can be replaced 


sive, harge, heavy and suscep- 


by an active circuit which has 


tible to hum (A-C line frequency) 


an appropriate input impedance. 


pickup. 


2. It is possible to use an 


2. For filters consisting of 


active element as a cathode or 


only resistors and capacitors, 


emitter follower or an opera- 


the poles and zeros of the 


tional amplifier in S3mthesis of 


driving-point immittance func- 


the filter. 


tions of RC filters are restrict- 


3. It is possible to realize 


ed t^) the negative, real axis of 


driving-point functions and 


the s plan#, and the same is true 


transfer functions with no 


for the poles of the transfer 


restriction on the poles and 


functions. 


zeros. 


3. A maximum attenuation of 6 


4, Positive pass band gain 


db per octave can be obtained 


can be designed into the cir- 


vfith each individual RC filter 


cuit. 


section. 


5. Simpler network configur- 


4, RC filters exhibit attenua- 


ations at lower cost can be 


tion of the signal in the de^^ 


achieved , 


signed pass band. 






and zero above a certain frequency, with no phase shift in the 
pass band, is unattainable in the real world. Three approxima- 
tions to the ideal filter can be realized by means of the 
Butterworth, Bessel or Chel:^shev filters. \lk\ 

The Bessel filter exhibits maximally flat time delay 
(linear phase) and therefore sometimes lis used as a time delay 
network. Its amplitude response in the pass band is monotonically 
decreasing rather than flat. Its rate of fall bqyond cutoff is 
less than the Butterworth or Chehyshev filters. 

The Chebyshev class of filters have an equal magnitude 
ripple in the pass band and maximum rate of fall beyond 3 db 
outoff. The response of the filter at the cutoff frequency is 
always that of a minimum of the ripple. The allowable degree 
of ripple in the pass band can be accounted for in the filter 
design. 

The Butterworth filter is obtained by locating the poles 
of the network in accordance with the zeros of the Butterworth 
Polynomial. The nomalized transfer function is of the form 

/Zi2 (jw)/2 = 1 

1 + w2n 

where n is the number of poles in the network and w is the ratio 
of frequency of interest to cutoff frequency. The filter has 
a maximally flat amplitude response in the pass band and the 
slo|5e of rolloff outside the pass band increases directly with 
the number of poles in the transfer function. The response falls 
off at approximately a constant 6n db/octave. The phase char- 
acterisitios of the Butterworth filt«>r are not very linear. 

The time delay varies as a function of frequency. 
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9. LW-PASS FILTER DESIGN 

The FI low-pass filter and the F2 low-pass filter in AESTR 
are identical circuits. Each filter is a four pole Butterworth 
response circuit with discredto cutoff frequencies of 10, 15, 

30 and 60 Hz. The Rauch t3rpe filter network is selected since 
the circuit values are rapidly calculated for multiple filter 
sections by using the normalized tables contained in Foster's 
paper. [l^3 Also this network can be modified to provide a 
continuous variable cutoff frequency or have positive gain by 
raodif3ring the resistive elements of the circuit. [28] In AESTR, 
the filters have unity gain and the cutoff frequencies are 
established by switching various capacitor values into the 
network while maintaining all resistor values at a constant value 
of lOK, The author decided to vary the capacitors rather than 
the resistors to control cutoff frequencies because of hard- 
ware considerations. As more data and experience is gained in 
the operation of AESTR, it may be desirable to design positive 
gain and continuous variable cutoff frequency into the filters 
based on recommendations of the speech therapists. Each filter 
is mounted on a separate circuit board and modifications can 
be accomplished without changing the internal chasis wiring. 

The Rauch filter basic building block is a single section 
which has two poles in the complex frequency plane. Its schematic 
and transfer function are shown in figure 8. Two cascaded 
sections are required to obtain a roll-off of 24 db per octave 
for frequencies above the cutoff frequency, A 25 nf coupling 
capacitor is inserted between sections to block D,C. components 
while a lOK shunt resistor is inserted at the input of each 
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■'out 



'in 



1 

R1R2C1C2 



R2R3 + R1R3 + R1R2 
R^R2R3Ci 



S + 



1 

R2R3C1C2 



Figure 8 

Two Role Rauch Low-Pass Filter and Transfer Function 



^3 



section to provide a D.C. return path to the base of the in- 
verting input transistor enclosed in the Fairchild uA 709 
operational araplifier. This resistor also develops the required 
input voltage necessary for proper filter response. All filter 
network resistors are fixed at 10 K ohms to provide an adequate 
filter impedance match to the mixer output and to determine 
practical capacitor values which can be obtained for fabrication 
of the network. 

FrcHii the table of normalized capacitor values for a Butter- 
worth filter with four poles, [l^ , it is a simple matter to 
calculate capacitor values for various low-pass filter cutoff 
frequencies. The calculated and actual values used in the 
AESTR apparatus are listed in table 6. Although the actual 
capacitor component values deviated fran the calculated values, 
the filter response is quite satisfactory. Figure 9 is a plot 
of the frequency response curves of the low-pass filters in 
AESTR. 

The various capacitors are mounted on a five pole two gang 
switch which is operated frcan the front panel of AESTR, The 
ten inch cable wires between capacitors and circuit boards do 
not contribute any noticable adverse effect on the filter re- 
sponse. 

A zero output response is observed for zero beat frequency 
output of the mixer stage due to the coupling capacitors of 
the filter. This effect does not affect the purpose for which 
AESTR is to be used since it ^papcaciil&^lly impossible for a 
person to hold his vowel formants exactly on frequency with 
the local oscillators. The continuous deviations of the formants 
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TABLE 6 



CAPACITOR VALUES OF FOUR POLE BUTTFJIWORTH RESPONSE RAUCH 
irPE LCT-/-PASS FILTER WITH ALL RESISTOR VALUES SET AT lOK OHMS 



Capacitor Valu* 


(uf) 


Cutoff Frequency (Hs) 








10 


15 


30 


60 


r* 

'"1 


computed 




6.25 


4.17 


2.08 


1.04 




actual 




8.2 


4.0 


2.0 


1.0 


^2 


computed 




0.41 


0.27 


0.14 


0.068 




.actual 




0.4 


0.22 


0.13 


0.068 


n 

"3 


computed 




2.58 


1.73 


0.86 


0.43 




actual 




4.0 


1.5 


0.8 


0.4 




computed 




0.98 


0.65 


0.33 


0.164 




act-ia.1 




1.0 


0.8 


0.4 


0.168 




_^£ure 9 Lov; Far.s Filter .iesponae Curves 
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are sufficient to cause boat frequencies which will be present in the 
pass bands of the filters* 

The filter output is passed into an amplifier using a 1 K ohm in- 
put resistor and a 1 Megohm potentiometer in the feedback circuit 
across a uA 709 operational amplifier. The potentiometer control is 
designated as "filter sensitivity" on the front panel of AESTR, 

As stated previously, two identical low-pass filters are contained 
in the AESTR system. One filter responds to the first formant beat 
frequency and the other responds to the second formant beat frequency 
generated in their respective mixer circuits. Figure lOa is a 
schematic of the complete filter network while Figure 10b is a schematic 
of the beat frequency amplifier which drive a 0-10 volt rectif 3 ring 
voltmeter. Several typed of meters were considered for use as 
visual indicators of the beat frequency. The meters used in AESTR 
were selected simply because they were available in the stockroom 
and adequately served AESTR' s purpose. 

In Figures 10a and 10b, the uA 709 operational amplifiers 
are frequency compensated in the same manner shown in the 
preamplifier schematic of Figure 6. The components have been 
omitted from the filter and amplifier circuits for the sake 
of clarity. Also the schematics identify terminals associated 
with circuit board B. Circuit board C is identical to B with 
respect to all terminal connections and canponent values. 
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Figure 10a. Four Pole Rauch Low-Pass Filter Schematic 
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Figure lOb 

Low-pass Filter Gain Schematic and Beat Frequency Indicator 
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10. HIGH-PASS FILTER DESIGN 

In order to prevent the fundamental frequencgr of the vocal 
bands fran passing directly through the mixer and low-pass 
filter circuits, a high-pass filter network is inserted between 
the preamplifier and mixers. Its configuration is realized 
by the Salen and method, A high-pass filter has a 

normalized frequency transfer function of 

Bout = S^ 

Ein s2 + ds + 1 (^) 

where d is the damping factor. This type of response is obtained 
from the basic high-pass filter network of Figure 11, 



-Vcc 




Figure 11, 

Basic High-Pass Filter Network 



Such a filter will give a 12 db per octave roll-off for fre- 
quencies below the cutoff frequency, R^, R 2 , and C 2 and the 
gain of the emitter follower act together to determine the cutoff 
frequency and the shape of the response curve d\iring the transi- 
tion from the stop band to the pass band. In the actual circuit, 
R 2 is equal to the resistance of three parallel resistors, These 
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are the two bias resistors and the input resistance of the 
2N226 transistor. 

The AESTR high-pass filter schematic is shown in Figure 12. 
An emitter follower drives the high-pass filter stage which 
consists of two cascaded sections to yield an expected atten- 
uation of 24 db per octave. |^16^ The individual sections do 
3rield a Butterworth response of 12 db per octave roll-off, 
but, when cascaded together, a total roll-off of only 20 db 
per octave is realized with an additional +3 db hvunp at the 
comer frequency. The actual response shown in Figure 13 is 
considered satisfactory for the pitch elimination function 
in AESTR *s system. 

Note that the pitch eliminator has four discrefei cutoff 
frequencies of 75* 190, 450 and IO5O Hz. The desired cutoff 
is obtained by switching in various capacitors mounted on a 
five pole, two gang switch attached to AESTR’s front panel. 

The fifth position permits the high-pass filter to bo bypassed 
so that AESTR can bo used to discriminate between voiced and 
unvoiced consonants. This feature was incorporated into the 
apparatus after Dr. Gray operated a breadboard version of the 
system and suggested that a ’’pitch” or ”no pitch” capability 
be incorporated into AESTR, 
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Figure 12, Two Section High-Pass Filter Sche-natic 
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Pi;^iire 13« High-Pass Filter Response for Pitch- 
Selector Suitch Positions 



11. DECISION AND RESPONSE CIRCUIT DESIGN 



The beat fi^uency output of the first and second formant 
filters is applied to terminals 3 and 20 of circuit board D 
whose schematic is whown in Figure 14. These waveforms are 
Half-wave rectified and smoothed a low-pass passive RC 
filter. The resultant D.C, voltages are impressed on the two 
input gates of an AND circuit. When both FI and F2 beat frequency 
rectified voltages are simultaneously present and also of suffic- 
lent magnitude to cause +7.5 volts D.C. to appear on each 
diode of the AND gate, the diodes become reverse biased thereby 
directing a 400 microampere current into the base of the '2N2924 
transistor. This action drives, the transistor into saturation, 

r 

permitting a collector current of 30 milliamperes to flow 
through the relay coil, which acts as the load for the circuit, 
and closes the relay contacts. A zener diode is inserted at 
the base terminal of the transistor to prevent the transistor 
from being switched on when only one diode of the AND gate is 
reverse biased. 

The relay is a stockroom surplus it«n which operates on 
14 volts and 25 milliamperes. It has two sots of contacts. 

One set activates the green panel ’’correct” light and the other 
set connects a 115 volt supply to the appliance socket mounted 
on the rear chasis of AESTR. The Monter^ Institute for Speech 
and Hearing does have a 115 volt relay operated device which 
dispenses HSM candy disks to children when they perform desired 
tasks, AESTR is able to operate this dispenser or any other 
115 volt device in response to the desired articulation of 
the child. 
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Figure lli. Decision and Response Circui 



12. FABRICATION 



Economy and availability of supplies dictated construction 
of AESTR, All components are housed in an aluminum case 16** 
wide, 12” deep and 10" high. The control panel is inclined 
20° from the vertical so that the values of the control settings 
can be read with greater ease. The case was handmade in the 
student metal shop. In addition the control panel was rubbed 
with emery paper until the metal acquired a satin finish. 

The chasis for circuit eonponents has four 22 teminal 
sockets which accept the standard by 6" circuit boards. 

Also mounted on the chasis is an 11 pin socket for the power 
supply package, mounting holes for the relay plus an octal 
socket for power distribution cables and a 2? pin socket for 
signal distribution cables which originate tram, the components 
mounted on the rear of the control panel. Fusing is provided 
for circuit protection. 

The circuit boards are identified by letters which are; 
Board A Preamplifier, High-Pass filter. Mixers 
Board B FI Low-Pass filter, an^lifier 
Board C F2 Low-Pass filter, amplifier 
Board D Rectifier, AND circuit, transistor switch 
The functional segregation of the circuit boards permits future 
changes to the circuitry simply replacing an entire board. 

It should not be necessary to change the internal wiring of 
the chasis for such modifications. 

Original circuit boards used for mounting of components 
were the etched contact plugboards Vector #838PWE. They are 
considered to be restrictive in flexibility. The Vector 



55 



#838 fsai-^etched boards proved to be more versatile. Components 
are mounted easily and securely with the aid of metal washers 
riveted on the holes through which the lead wires pass through 
to the other side of the be^itd. Additional holes must be 
drilled into the board to accomodate the integrated circuit 
octal socket. Learning how to properly mount cwnponents so as 
to conserve space, minimize leads and avoid ground loops is 
considered by the author to be a very useful and Important aspect 
of this thesis. 

The electronic circuits of AESTR require 30 milliamperes 
on both the plus and minus 15 volt supply terminals. When the 
relay and "correct” light are activated, the current drain 
increases to 95 milliamperes on both supply terminals. The 
power is supplied by a Power Mato Power Supply, Model DRA16- 
.2/16-. 2. Its regulated output can bo wet between 15 ^d 17 
volts and is rated to provide 200 milliamperes on the plus 
and minus terminals. The voltage regulation is excellent even 
during sudden current level changes when the light and relay 
activate. Figure 15 shows the power distribution in AESTR. 

f' 

As stated previously, the filter capacitors are.-mouhtod 
on five pole, two gang switches. These cranponents are located 
longitudinally around the periphery of the ^witches so as to 
economize on space and also obtain structural support. 

Trouble shooting the ^stem after AESTR was completely 
wired consumed many hours. A component value error and cable 
error required correcting before successful operation of the 
assembled machine could be achieved. 

Numerous minor problems were encountered in the fabrication 
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Figure 15. 

AF»STR Power Distribution 
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Octal terminals 



of AESTR» These difficulties did serve to prove the fact 
that transition fran theory to a practical working apparatus 
is not a trivial matter. 
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13. PRELIMINARY TEST RESULTS 

AESTR was initially tested during the final phase of its 
design stage by Dr. Gray at the school electronics laboratory. 
At that time, the FI and F2 low-pass filters had fixed cutoff 
frequencies of 50 and 100 Hz respectively. His evaluation 
of the machine indicated that the pass band of the filters 
had to be reduced in order to have the machine properly dis- 
criminate between the closely related voiced sounds such as 
ER and E. Therefore the filters were redesigned to have a 
series of discrete cutoff frequencies of 10, 15, 30 and 60 Hz. 

During this initial evaluation, it was also learned that 
air streams impinging on the microphone cause a transient re- 
sponse in AESTR of sufficient magnitude to activate the relay 
circuit. To avoid such a type of false response, the speaker 
should hold the microphone in a vertical position approximately 
four inches avray from and slightly below his lips. In the case 
of a child, a microphone headset type configuration similar 
to the kind cOTimonly worn by telephone operators would keep the 
microphone properly positioned relative to the mouth of the 
speaker. 

After its fabrication, AESTR was tested by the author. 

The machine control settings obtained for an adult male voice 
and female voice articulation of the vowel sounds are listed 
in Table ?. These settings represent the best values which 
could be obtained for the smallest spectral window in the 
F1-F2 plane. In all cases, the first formant of the vowel was 
readily located with minimum sweeping of the FI local oscil- 
lator. The second formant was more difficult to locate for 
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AESTR COMTROL PANEL SETTINGS FOR VOWEL SOUNDS SPOKEN BY AN ADULT 
MALE VOICE AND AN ADULT FEMALE VOICE 
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vowel sounds IT, I, and ER. The F2 local oscillator must be 
swept across its frequency range three or four times before 
the operator is certain that the F2 frequent has been located. 
This is to be expected since the amplitude of the second for- 
mants is lower than the first formant for all vowel sounds. 

Pitch measurements were made according to the procedures 
stated in section 5» Pitch frequencies are rapidly determined 
and do show a variation with the vowel sounds as Indicated 
in Table 2. 



TABLE 8 

AESTR PITCH MEASUREMENTS OF AN ADULT MALE VOICE FOR VOWEL SOUNDS 



Vowel 


IT 


I 


E 


AE 1 A 


ow 


U 


00 


Pitch (Hz) 


no 


117 


98 


98 I 96 

, 


90 


112 


108 



UH 


ER 


10k 


100 



AESTR is now on loan to the Monterey Institute for Speech 
and Hearing for field testing. Their preliminary operation 
of the apparatus in conjunction with an M«51M candy dispenser 
revealed a new problem. Candy disks wore being dispensed at 
a very rapid rate since the relay opened and closed every time 
the voice quivod in and out of the desired sound spectral 
window. Therefore, to make AESTR provide only one reward item 
with each sustained sound, the AND circuit was modified to 
have a 250 millisecond delay before closing the relay contacts, 
and once closed, the relay would not open for two seconds. 

This modification consisted of choosing the correct shunt ca- 
apacitor values in the half-wave rectifier portion of the decision 
and response circuit. A nominal value of 100 microfarads 
working with the resistive elements of the circuit develops 
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build-up and decay time constants to moot the operating speci- 
fication for the relay. 

Dr. Gray and his associates tested AESTR for its ability 
to discriminate the individual vowel sounds. The preliminary 
results indicate that the machine, for certain vowels, will 
give a positive response to not only the targeted vowel but 
also to certain other vowel sounds. For example, AESTR can 
be sot to respond to CM and it will perform properly such that 
the speaker is unable to cause a positive machine response 
with any vowel sound other than OW. However, if AESTR is 
targeted for the central vowel sound ER, the machine will 
respond to ER plus the phonemes A, CW, U, 00 and UH. The 
apparent cause for this undesirable multi-sound response is 
duo to the fact that EIR has a relatively low intensity level 
for its first and second formants when cwnpared to back vowels, 
especially OW. Unfortunately, the therapist has a greater 
need to teach the ER rather than OW to speech handicapped 
children. To improve AESTR ’s ability to respond strictly to 
the ER sound, *.T)r, Gray and the author varied the "pitch” 
control settings. The attempt indicated that sme Improvement 
could be made if the ■pitch" control is set to position "D". 
Now the machine will respond only to ER and CW, The OW vowel 
continues to mask all other vowels since it does contain the 
greatest amount of energy throughout the audionfrequen^ spec- 
trum, 

A different approach was tried to overcOTie the ER ambi- 
guity response of AESTR, Both the FI and F2 local oscillators 
were set to the second formant frequency of 1480 Hz while 
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the “pitch” control rwnained in position ”D”, The volume con- 
trol was set to a value of 3 and the sensitivity controls were 
set to a value of In this state, the machine would respond 
only to the ER sound for a majority of trials. This can be 
explained by noting that CW has both its FI and F2 frequencies 
below 1 KHz which are attenuated by the high-pass filter and 
the harmonic components of CW near 1480 Hz are insufficient 
to cause a positive response of the machine. Now, when a speaker 
makes the ER sound, its second foiroant (near 1480 Hz) is not 
attenuated by the high-pass filter and will provide a strong 
beat frequency out of both FI and F2 filters thus causing 
AESTR to give a positive response. This type ofuitioliine oper- 
ating procedure will be investigated further and extended to 
take advantage of the third formant information associated 
with each vowel. 

A speaker is able to cause AESTR to give a positive response 
when he greatly increases the intensity of his voice. The 
author recommends that some t 3 ^e of distortionless speech 
canpressor be inserted between the microphone and preamplifier. 
Commercial devices are readily availabe to control the micro- 
phone peak loudness yield, 

A human limitation prevents AESTR from being operated 
for more than 15 minutes by one speaker. After a person has 
been producing voiced sounds for this period of time, he will 
start becoming hyperventilated and experience dizziness. The 
effect is analogous to a person blowing up a large balloon. 

Dr. Gray is giving consideration to this factor and will de- 
velop a clinical testing procedure to avoid hsrperventilation 



of the speaker. 
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14. CONCLUSIONS 

The prototjrp® apparatus does perform electrically in the 
manner it was designed to operate but this does not imply that 
AESTR is performing in a totally satisfactory manner from the 
viewpoint of the speech therapist. AESTR is considered to bo 
approximately 505^ successful in meeting the needs of the 
t^horapist. With more operating data obtained fron the machine 
in future months, it is hoped that additional design ctitoria 
can bo established to improve AESTR' s performance. 

In addition to aiding speech handicapped children, AESTR 
has potential applications to aid persons trying to learn 
foreign vowel sounds. Also this apparatus can be used in an 
auxiliary manner to measure tones of musical instruments such 
as pianos or organs with a high degree of accuracy. 

Speech processing and especially specific analysis of 
spectral components of voiced sounds is a challenging task 
from an engineering viewpoint. This fact became very apparent 
from what appeared to be a very straight forward thesis subject. 
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APPENDIX 1 



Selected Glossary of Speech Terms 

ARTICULATE. To produce a speech sound hy the organs of speech. 

ARTICULATION, The set of human bodily positions and movements 
aiming at the production of speech sounds, 

BACK. A vowel articulated by raising the back part of the 
tongue towards the velum, e.g. sort. 

CENTRAL. A vowel articulated by raising the central part of 
the tongue towards the juncture of the palate and the 
velum, e.g, first. 

CONSONANT. A speech sound articulated by a complete closure 
of the air passage or by a narrowing of it beyond the 
vowel limit, e.g, go, or see, 

DIPHTHONG, A vowel articulated by a deliberate movement of 
the speech organs from one position into the other, 

FRICATIVE, A consonant articulated by a narrowing of the air- 
passage resulting in the audible friction, e.g. shame. 

FRONT, A vowel articulated by raising the front part of the 
tongue towards the palate, e.g. get. 

FULLY VOICED. A speech sound articulated by the vocal cords 
vibrating during the whole of its articulation, e.g, 
living or put. 

ORGANS OF SPEECH, Those parts of the human body which are 

active in the production of speech sounds, i.e. the lungs 
the trachea (windpipe), the vocal cords, the glottis, 
the pharynx, the nose, the lips, the teeth, the alveoli 
(teeth ridgeO, the palate (hard palate), the velvim (soft 
palate), the uvula, the tongue. The tongue is arbitrar- 
ily divided into four parts; the tip, the blade, the 
center and the back. 

PHONEME. A class of distinctive speech sounds, the members 
of which are (1) in cmnplementary distribution with 
each other, and (2) in opposition or contrast to any 
other class of distinctive speech sounds. Thus, /d/ in 
read and /d/ in middle are members of the same phoneme, 
whereas /d/ in date and /l/ in late are members of two 
different phonemes. 

PHONEMICS, The scientific study of distinctive speech sounds. 

PHONETICS, The scientific study of speech sounds. 
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PLOSIVE, A consonant articulated by a complete closure of the air 
passage, combined with air=-compression behind the closure, and 
followed by an eiqjlosion in the release stage, e,g, kind, 

SPEECH, A sequence of sounds articulated for the purpose of human 
communication , 

SYLLABLE, A structural unit capable of being connected as a whole 
with one particular degree of accent, e.g, become, 

VELUM, The soft palate of the oral cavity, 

VOICED, A speech sound, consonant or vowel, articulated with the 
vocal cords vibrating during the whole of its articulation, or 
part of it, e,go weather, park, one, 

VOICELESS, A speech sound, especially a consonant, articulated with 
no voicing, e,g, lucky, 

VCWEL, A speech sound articulated with no closure of the air-passage 
and no narrowing of it beyond the vowel limit, e,g, bad or most. 

WORD. A structural unit separated in writing by spaces, e.g, bed 
(one word), room (one word), bedroom (one word), textbook 
(one word), a good subject (three words). 
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APPENDIX II 

PHOTOGRAPHS OF PROTOTYPE F>QUIPMENT 
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Figure 16. The AKSTR System 
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Figure lb. ASSTR Chasis 
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Figure 19. AKSTR Circuit Boards 
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